Near-Optimal Bounds for Cross-Validation via Loss Stability
نویسندگان
چکیده
Multi-fold cross-validation is an established practice to estimate the error rate of a learning algorithm. Quantifying the variance reduction gains due to cross-validation has been challenging due to the inherent correlations introduced by the folds. In this work we introduce a new and weak measure called loss stability and relate the cross-validation performance to this measure; we also establish that this relationship is near-optimal. Our work thus quantitatively improves the current best bounds on cross-validation.
منابع مشابه
Stability of cross-validation and minmax-optimal number of folds
In this paper, we analyze the properties of cross-validation from the perspective of the stability, that is, the difference between the training error and the error of the selected model applied to any other finite sample. In both the i.i.d. and non-i.i.d. cases, we derive the upper bounds of the one-round and average test error, referred to as the one-round/convoluted Rademacher-bounds, to qua...
متن کاملCross-Validation and Mean-Square Stability
k-fold cross validation is a popular practical method to get a good estimate of the error rate of a learning algorithm. Here, the set of examples is first partitioned into k equal-sized folds. Each fold acts as a test set for evaluating the hypothesis learned on the other k − 1 folds. The average error across the k hypotheses is used as an estimate of the error rate. Although widely used, espec...
متن کاملOptimal Placement and Sizing of Multiple Renewable Distributed Generation Units Considering Load Variations Via Dragonfly Optimization Algorithm
The progression towards smart grids, integrating renewable energy resources, has increased the integration of distributed generators (DGs) into power distribution networks. However, several economic and technical challenges can result from the unsuitable incorporation of DGs in existing distribution networks. Therefore, optimal placement and sizing of DGs are of paramount importance to improve ...
متن کاملDetermining an Economically Optimal (N,C) Design via Using Loss Functions
In this paper, we introduce a new sampling plan based on the defective proportion of batch. The proposed sampling plan is based on the distribution function of the proportion defective. A continuous loss function is used to quantify deviations between the proportion defective and its acceptance quality level (AQL). For practical purpose, a sensitivity analysis is carried out on the different v...
متن کاملNear-Infrared Spectroscopic Analysis of Hemoglobin with Stability Based on Human Hemolysates Samples
Near-infrared (NIR) spectroscopy combined with the partial least-squares (PLS) regression was successfully applied for the rapid quantitative analysis of hemoglobin (HGB) based on human hemolysates samples. Based on the varied divisions for the calibration and prediction sets, an effective modeling approach using stable model parameters was proposed. Among 255 samples, 80 were randomly selected...
متن کامل